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Abstract. We present a new analysis of previously pub- 
lished optical and radio data sets of the gravitationally 
lensed quasar 0957+561 A,B with the aim of determining 
the time delay between its two images. We use a non- 
parametric estimate of the dispersion of the combined 
data set where, however, we only make use of alternat- 
ing neighbours in order to avoid windowing effects. From 
the optical data a time delay of (415 ± 32) days and from 
the radio data a delay of (409 ± 23) days is suggested. 
We demonstrate a considerable sensitivity of different de- 
lay estimation procedures against the removal of only a 
few observational data points or against smoothing or de- 
trending of the original data sets. The radio data give us 
formally a slightly more precise value for the time delay 
than the optical data. Also, due to the lack of window- 
ing effects, the result obtained for the radio data can be 
considered as somewhat more reliable than the delay de- 
termined from the optical data. 

Key words: Methods: statistical - time series analysis 
- Quasars: 0957+56lA,B - gravitational lensing - time 
delay 



1. Introduction 

The time delay between the images of a gravitationally 
lensed object is of great astrophysical interest since it 
may be used to determine the Hubble parameter as well 
as the mass of the lens (Refsdal 1964a, 1964b] , Borgeest 
1986|, Falco et al. |1991b|). The double quasar 0957+561 



A,B (Walsh et al. |1979| , Young [1980| Falco [1992]) is up 
to now the only gravitational lens system for which seri- 
ous attempts have been made to determine the time de- 
lay r between its images. However, the results are still 
controversial, with suggested tim e dela ys of between 376 
and 657 days (Florentin-Nielsen |l984J Schil d fc Cholfin 
1986 , Gondhalekar 1986| , Gorenstein et al. 1988 , Lehar 



et al |1989[ , Vanderriest |1989| , Schild |1990|, Falco et al. 



1991a, Lehar et al. 1992, Roberts et al. 1991, Press et al. 



19924 |1992b| , Beskin & Oknyanskij |1992j). U sing a n elab- 



orate statisti cal method (see Rybicki et al. 1992 ) Press 



et al. (1992a, below PRHa), recently obtained a value of 



Send offprint requests to: R. Kayser 



(536 + 12) days from the optical lightcurve, which is seem- 
ingly in good agreement with the value obtained from the 
radio data by the same authors (Press et al. 1992b , be- 
low PRHb). They conclude that "delays less than about 
475 da ys are strongly exclu ded" . It is also often claimed 
(Lehar |l992j , Roberts et al. |l99l| , PRHb) that the radio 
data decisively exclude a delay of around 415 days, a value 
favoured by other authors (Vanderriest et al. 198£, Schild 
1990). 

The time delay obtained by PRHa is quite close to 
1.5 years, a value for which windowing effects due to the 
uneven sampling of the optical data are expected to be 
strongest (Vanderriest et al. 1992J ). 

We here present results of a careful re-analysis of the 
same data sets as used in PRHa and PRHb using simple 
explorative type statistical methods. The main emphasis 
of our work is the evaluation of two competing hypothet- 
ical time delays: 415 and 536 days. 

For the combined data set generated from the data of 
image A and the data of image B, time shifted by r, we 
basically estimate the dispersion D 2 of the scatter around 
the unknown mean curve. The true time delay between 
the images should then show up as a minimum in the 
dispersion spectrum D 2 (r) . It is our aim to determine the 
dispersion of the combined data due to the alternation be- 
tween the two light curves, and not the dispersion within 
each lightcurve. In order to avoid strong windowing effects 
we therefore take into account only alternating neighbour- 
ing pairs in the combined data set, i.e. only pairs where 
one point is from A and the other one from B, respectively. 
It should be noted that for r = (n + 0.5) years (where n is 
integer) the dispersion D 2 (t) would otherwise be strongly 
dominated by the inner dispersion of the two original light 
curves, naturally leading to pronounced minima. 

In Section 2 of the paper we introduce the basic statis- 
tical techniques used. In Section 3 we present the results 
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of a detailed analysis of the light curves of the double 
quasar 0957+561 A,B. We here use the same data sets 
as PRHa and PRHb to allow a comparison of the results 
and to understand the problems leading to the different 
results. In the concluding part we try to summarize the 
work done on time delay estimation up to now and to de- 
lineate some inner difficulties of the problem under study. 
Technical details of the nonparametric dispersion estima- 
tion are treated in the Appendix. 

2. Data analysis methods 

2.1. The optimal prediction 



In the papers PRHa, PRHb and Rybicki et al. ( [1992[ ) 
a rather complex procedure for an optimal interpolation 
of the lightcurves was developed. The procedure itself is 
practically the same as used in geophysical research un- 
der the name of kriging (Cabrera et al. 1990). The use 



of sound statistical arguments and validation of the re- 
sults by Monte-Carlo simulations allowed the authors to 
draw conclusions on the probable time delay with a (for- 
mally) high level of confidence. Since our analysis did not 
fully confirm these results, we reimplemented the proce- 
dures used in PRHa and PRHb and used some simple 
procedures to check the inner consistency of the results. 
We started from a very simple computation of the num- 
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Fig. 1. Window function for optical data 

ber of mixed data pairs in the combined sequence (data 
set B shifted by r coalesced with data set A). In Fig. [I] 
the number of neighbouring data pairs of mixed type (one 
from data set A, one from data set B) is plotted against 
the trial time delays. This simple statistic measures well 
the total overlap between the two data sets for different 
shifts in time of the B component. Inside of the range of 
probable delays quoted in PRHa two strong minima at 
r = 527 da,ys,NA,B — 43 and at r = 535 days, Na,b = 43 
are found. There is a strong similarity between the win- 
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Fig. 2. Distribution of dispersion ratios for artificial light 
curves generated by Monte-Carlo method. The value obtained 
from the original optical data is marked with a dash. 



dow function plot in PRHa and our simple statistic. It 
is quite obvious (and was noticed also by PRHa) that 
the coincidence of minima in the window function and 
in the % 2 spectrum in PRHa is a rather serious signal 
about the complexity of problems one encounters with the 
lightcurves of QSO 0957+561. 

After reimplementation of the PRHa procedures we 
carried out the following simple experiment. We divided 
the data points of the combined data set (computed with 
t = 536 days) into two subsets: relevant and irrelevant. 
Into the first subset we included observations which had 
at least one nearest neighbour from alternate original data 
sets, and into the second subset we included observations 
for which both neighbours had the same origin as the data 
point under consideration. After computation of the opti- 
mally approximated (predicted) values for all data points, 
we were now able to compute separately prediction errors 
for relevant and irrelevant observational points. We found 
that the dispersion for relevant points was by a factor of 
five larger than that for irrelevant points. 

This result lead us to another interesting test. We re- 
peated Monte-Carlo simulation runs with different time 
delay values exactly as described in PRHa and used our 
data set subdivision procedure for each of the generated 
artificial data sets. In Fig. [2] the resulting distribution of 
dispersion ratios is shown. The mean ratio is slightly less 
than unity. This fact can be explained in the following way: 
the overlap regions for the two original data sets are pop- 
ulated in the mean two times more densely than regions 
where only one data set is seen. For the sparsely populated 
regions there is a tendency for the optimal prediction pro- 
cedure to put the prediction curve through the observed 
points and consequently in these regions the dispersion 
for prediction errors is as a rule less than for overlapped 
regions. However, as can be seen from the distribution di- 
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agram, the typical decrease is by far not as severe as for 
the original data set (marked in Fig. 0). 

The conclusion is discouraging: the Monte-Carlo pro- 
cedure proposed in PRHa, which was considered as ex- 
perimentum crucis for their time delay computations does 
not model the actual data set at hand, but it proves only 
that for data sets with certain regular statistical proper- 
ties their procedure works quite well. The real data set is 
unfortunately more complex, this is probably caused by 
microlensing. 

Our negative experience with the optimal prediction 
procedure lead us to the conclusion that to understand 
all the intricacies involved with the actual observations 
we must seek some extremely simple statistics whose use 
is not shadowed by the complexities of the mathematical 
procedures involved. Our aim was not so much to prove 
that one or another delay value is right or wrong, but to 
understand how the different values are "cooked" from the 
data set. 

2.2. Use of nonparametric dispersion estimates for shift 
analysis 

Let Ai = g(ti) + ex(ti),i — 1,2,..., Na and Bj — 
(g(tj + t) — a) + tB(tj),j = 1, 2, ... , Nb be two time se- 
ries which are different measurements of the original light 
curve git), ft is assumed that for channel B the obser- 
vations Bj are amplified with an unknown amplification 
ratio a. Observed values contain also observational errors 
eA(ti), £b(£i) for which we may have certain dispersion es- 
timates <5^(tj),<5|(ij) or corresponding statistical weights 
Wa(U), Wb(£j)- Our task is to estimate the time shift pa- 
rameter r from observed time series. 

For every fixed set of values r, a we can construct a 
combined data set from both observed series 



Cfc(ifc) 



At 
B, 
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(1) 



and estimate its dispersion D 2 (r, a) with the simple non- 
parametric method described in the Appendix. The dis- 
persion spectrum is now defined as: 



D 2 (t) = min D 2 (t, a) 



(2) 



Statistical weights can be introduced into our scheme 
by using the usual combined weights for every pair of ob- 
servations. If the statistical weight of channel A is W, and 
that of channel B is Wj then the dispersion for the differ- 
ence Ai — (Bj + a) is equal to 



1 



1 



(3) 



and the squared difference of the two observations must 
be consequently included with weights 



TI'; 



1,3 



WjWj 

Wi + Wi 



(4) 



A correct normalization is obtained by dividing the corre- 
sponding sums of squared differences by the total sum of 
weights. 

By choosing different selection windows (see Ap- 
pendix) we can now compute different statistics and com- 
pute corresponding dispersion spectra. In several numer- 
ical experiments we tried to understand the behaviour of 
many different variants depending on features in the data 
sets at hand. However, to present our results we use only 
one, and by the way the simplest algorithm: only neigh- 
bouring data points are considered as near enough, to be 
included in the cross-sums. We even do not specify an 
upper limit S for a maximum allowed time separation be- 
tween two sequential points. This approach allows us to 
avoid any free parameters in our procedures, and makes 
the calculations simple enough to be well reproduced by 
other researchers. 

Correspondingly, we use in the following analysis only 
two simple statistics. The first one 



D all = min 






(5) 



measures the general dispersion of the combined data set. 
The second statistic 



£>A,B = min 



Sfc=i Wk.k+iGk{Ck+i — CkY 
2 2t=i Wk,k+iGk 



(6) 



where Gk — 1 only when Ck+i and Ck are from different 
data sets and Gk — otherwise, measures the dispersion 
in the overlap areas of the combined light curve. 

2.3. Computation of error bars for time delays 

To get an idea about error bars for the minima in the 
dispersion spectra we used a simple bootstrap procedure. 
We applied a 7-point median filter to smooth the combined 
light curve and reshuffled the corresponding residuals 1000 
times to generate bootstrap samples. The full procedure 
turned out to be not very dependent on the median filter 
length, we therefore present only results for the particular 
value of 7. 

It must be stressed that the somewhat arbitrary value 
for the filter length was used only to estimate mean square 
errors of the computed values of the minima, the minima 
themselves do not depend on any prechoosen numerical 
constant. 

It is possible to get much more narrow limits to the 
computed shift estimates by using longer smoothing fil- 
ters, but we wanted to be as conservative as possible. 

2.Jj.. Influential data segments 

To demonstrate the effect of removing short segments of 
data from the analysis we used the following simple pro- 
cedure. For any starting index value I in the row of ob- 
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servations, for any skip length m and for any pair of hy- 
pothetical shift values £1 , £2 which are interesting for us 
we can compute the statistic (gain of hypothesis r = £1 
against t = £ 2 ) 



I(l,m,Si,S2) = D\ B (l,m,Si) - D 2 AB (l,m,S 2 ) 



(7) 



where the dispersions are computed for data sets with 
skips (start of data skip from I, length of skipped sub- 
set m points). In various experiments we used different 
skipping schemes: sometimes we skipped observing points 
for both channels, sometimes only from one channel. The 
statistic 1(1, m, Si, £2) allowed us to compare the influence 
of skipped values for two different hypothetical time de- 
lays. Maxima and minima in 1(1, m, Si, £2) indicate skips 
which favor hypothesis S2 and hypothesis £1 , respectively. 
This kind of approach is quite common in standard 
regression analysis where robustness of the predicted curve 
is tested against removal of a small number of observed 
values (outliers). In our case we assume that the correct 
estimate for the time delay should not depend strongly 
on a small number of observations. If it does, we conclude 
that some of the data points are real outliers, or that the 
available data sets do not contain enough information to 
discriminate between the two hypotheses. 

3. Analysis of observed data sets 

3.1. Optical data 

We used the same data as PRHa, i.e. the optical data pub- 



lished by Vanderriest et al. ( |1989[ ). The D 2 all (Eq. g) and 



D A b (Eq. o) statistics for the original optical data set are 
depicted in Fig. g. The strongest local minima in the D\ B 
spectrum is at a shift of 525 days and bootstrap calcula- 
tions give us a standard error value of 42 days, so that our 
result is in good agreement with the value given in PRHa. 
It is interesting to note that the mean shift for the boot- 
strap trials occurred to be biased to a shift of 534 days. 
There is an additional depression around 415 days which 
is, however, somewhat weaker than the main "peak". It 
is important to note that the D\ B spectrum stays every- 
where well above the D 2 ^ spectrum, i.e. the dispersion in 
the overlap region is systematically higher than in the in- 
tervening regions. We therefore conclude that there must 
be a certain amount of additional mutual variability be- 
tween two light curves (maybe due to microlensing) , so 
that the model of shifted but otherwise similar images, 
which has been used here and in PRHa, is oversimplified. 
To get an idea about the form of the real depression 
in the dispersion spectra we proceed with a simple nu- 
merical experiment. We combined the A and B datasets 
with the best shift and the best a value, filtered the com- 
bined data set with a 7-point median filter, and decom- 
pose the filtered set back into two subsets. The spectra 
for the real data set and for the artificial data set (with 
a delay of 536 days) are given in Fig. [|. The spectrum 
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Fig. 3. Dispersion spectra for original optical data. Upper 
curve is D\ b (t), lower curve is D^ n (r) 
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Fig. 4. Dispersion spectra for original and artificial data 
sets. The two upper curves are D\ b (t)±D(t) 2 for the original 
data, the lower curve is D% b (t) for the artificial data 



for the real data set is depicted in the two nearby curves. 
The upper curve is the dispersion spectrum plus the es- 
timated subsampling error for the particular shift value 
and the lower curve is the dispersion spectrum minus the 
subsampling error estimate. Thus, the difference between 
the two curves demonstrates the uncertainty due to the 
subsampling. The minimum value in the spectrum for the 
artificial set (which is -Dtrend ^ we use the notation from 
the Appendix, see Eq. (A9)) is comparable to the largest 
difference between the upper and the lower limit for the 
original spectra. It is therefore clear that our dispersion 
estimation scheme works here (at least in the region of 
the main minimum) nearly optimally. It can be seen that 
the "line profiles" of the real and the artificial spectra dif- 
fer significantly. The feature in the real spectrum is more 
fluctuation like than the wide and systematic depression 
in the artificial spectrum. 
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Fig. 5. Gain of hypothesis r = 536 days against hypothesis 
r = 415 days for different locations of the three point gap in 
the original data set 



Fig. 6. Comparison of the two dispersion spectra obtained by 
different skips from the data sets. The three point gaps started 
from I = 58 for the bold curve and from I = 99 for the normal 



We then investigated the robustness of our first so- 
lution against skipping short segments from both light 
curves. The strong motivation to do this kind of analy- 
sis stems from the remark of PRHa that the "feature at 
epoch 6200" can bias estimation of the shift value towards 
415 days. 

In Fig. pi the results of a typical sensitivity analysis 
are shown. The gain 1(1, 3, 536, 415) (Eq. M) of hypothesis 
t = 536 days against r = 415 days is plotted against the 
starting index of a three point skip in the original data set. 
In Fig. ra the corresponding spectra are plotted: one with 
the most favorable (for hypothesis r = 536 days, minimum 
at / = 99) skip, and one with the most unfavourable (max- 
imum at I = 58). It is seen that the feature in the region 
of 536 days undergoes a rather dramatic change, whereas 
the feature around 415 days remains relatively stable. By 
skipping only three consecutive time points (the A and 
B channels are observed simultaneously) we can strongly 
emphasize the 536 feature, or remove it completely! There 
is no need to say that for longer gaps the effect of skips is 
even more drastic. 

From these experiments we concluded that it is rea- 
sonable to investigate the high frequency behaviour of the 
optical data sets. Since our investigation is of explorative 
type, we postulated freely (at least for the moment) that 
both light curves are corrupted by independent low fre- 
quency components (which can be physically interpreted 
as microlensing). We assumed that these components can 
be modeled by low degree polynomials. In Figs, fj] and 
pi the D\ b spectra for data sets which were obtained by 
removing polynomial trends with degrees 1 — 9 from the 
original data sets are shown. The feature around 415 days 
is much stronger than the feature around 536 days for de- 
grees 1,3,5,6,7,8,9. For degrees 2 and 4 (see Fig. 0) the 
minima for both hypothetical shifts are comparable. It is 
evident that starting from degree 5 the behaviour of dis- 
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Fig. 7. Dispersion spectra for the detrended data sets. On the 
plot the results for degrees 2 and 4 are depicted. 



persion spectra is quite similar. For our further presenta- 
tions we arbitrarily choose polynomials with degree 9 and 
in the following discussion we use as detrended data sets 
the corresponding least squares fit residuals. The compu- 
tations with other degrees (from 5 to 8) gave essentially 
the same results. In Fig. || the D\ xl and D\ B spectra for 
the detrended data are depicted. 

It is worth to note that the minimum of D\ B now 
nearly coincides with the value of D^ n for the same shift. 
Correspondingly the dispersion for the overlap areas and 
for the intervening areas are nearly the same for the best 
shift! The excess dispersion between the two light curves, 
which is probably due to microlensing, is removed. At least 
from this point of view the r = 415 days solution is much 
more consistent than the solution with r = 536 days. 

We then repeated our experiment with the artificial 
data set. In Fig. nfl the overall correspondence between 
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Fig. 8. Dispersion spectra for the detrended data sets. On the 
plot the results for degrees 1, 3, 5, 6, 7, 8, 9 are depicted. 



0.0060 



0.0050 




0.0010 



100 200 300 400 500 600 700 800 
Shift 



Fig. 9. Dispersion spectra for the data sets detrended by poly- 
nomials of degree 9. The upper curve is -Da,b(t) and the lower 
curve is £>au(r). 



the two spectra is striking. It seems, however, that the 
exact value of 415 days for the very narrow dip of the 
real spectrum can be slightly out of place, because it is 
"cooked" by high frequency local fluctuation just on the 
bottom of the feature. 

The sensitivity analysis for the detrended data sets is 
also revealing. In Figs. O and n3 it can be seen that the 
feature around 536 days can be amplified significantly or 
can be removed nearly completely by skipping three con- 
secutive observing nights from the original data sets. On 
the other hand the depression around 415 is rather stable. 
The bootstrap procedure gives us for the detrended data 
a standard error of ±32 days for the alternative solution 
of 415 days. The distribution of 1000 trials is depicted in 
Fig.0. 

There are two competing values for the time delay, 
both of which we were able to derive from optical data 



Fig. 10. Dispersion spectra for detrended (with 9-degree poly- 
nomial) and artificial data sets. The upper curve is D\ b (t) for 
detrended data set and the lower curve is D| b (t) for artificial 
data set. 
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Fig. 11. Gain for hypothesis r = 536 against hypothesis 
r = 415 for detrended data sets with three consecutive data 
points removed. 



using exactly the same techniques. We hoped that the 
analysis of the radio data could have given us clues as to 
which of them should be preferred. 

3.2. Radio data 



We used the radio data published by Lehar et al. (1992) 



exactly in the manner they were used in PRHb. The ob- 
servations marked in PRHb by an asterisk were skipped 
and the other observed values were transformed to loga- 
rithmic scale. The D^ n and D\ B spectra (Eqs. g and g) 
for the radio data are depicted in Fig. [14]. The total min- 
imum at (533 ± 40) days (error bars from bootstrapping) 
is again in very good agreement with the result of PRHb. 
This coincidence is so good that one may well be inclined 
to drop the r = 415 day hypothesis completely. The plot 
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Fig. 12. Dispersion spectra for the two skipping schemes with 
maximum contrast between two hypotheses. The three point 
gaps started from I — 58 for the bold curve and from I = 99 
for the normal curve. 
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Fig. 14. Dispersion spectra for the original radio data sets. 
The bold curve is D\ B ( r ) an d the normal curve is D^ n (r). 
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Fig. 15. Radio data sets combined for the best shift and pa- 
rameter a. The bold curve is A, the normal curve is shifted B 
Fig. 13. Distribution of shifts for the bootstrap simulations data 
for the optical data. 



of two light curves with appropriate shift and parameter 
a (Fig. [L5|) fits well into this picture. 

Nonetheless, we applied to the radio data some of the 
tests already used for the optical data. We first repeated 
the experiments with artificial data sets generated from 
the combined original data sets by 7-point median filter- 
ing. In Fig. [16] the plot of the original and the artificial 
spectra are shown. The results are not very convincing, 
neither pro nor contra. However, it seems that the arti- 
ficial spectrum (with r = 533 days) has a slightly more 
pronounced minimum. 

We then experimented with various skipping schemes, 
to look how stable our results are against removal of 
a small number of observations. It was quite surprising 
to find that the removal of only two successive outlying 
points from the B curve can change the situation dramat- 
ically, see Fig. n7\ The lower curve is the evaluation of the 



r = 536 days hypothesis against the r = 415 days hypoth- 
esis, when removing pairs of observations from only the B 
curve. The upper curve depicts the same statistic for the 
data with two points removed from the B data set (the 
skipped points are from April 10, 1990 and May 7, 1990). 
It is seen that the data set with skips in the B curve is 
relatively stable against further removing of time points. 



In Fig. 18 the original data points for the A and B curve 



are depicted, with linear interpolation between the points. 
The "bad" points are exactly the points which cause the 
strong fluctuation at the very end of the B data set. 

We proceed now with data set B without the two "bad" 
points. It is no surprise to see that the main minimum of 
the D\ B spectrum moves to r = 409 days and that the 
overall correspondence between the artificial spectrum and 
the real one is also good (see Fig. 



19). Quite pleasing is 



also the plot of the optimally shifted B curve with two 
skipped points and the original A curve (see Fig. |2CJ ). 
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Fig. 16. Comparison of the dispersion spectra for the original 



radio and artificial data sets. The bold curve is D\ B (r) for 



the original radio data, the normal curve is D\ B (r) for the 
artificial data. 
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Fig. 17. Effect of data skipping to £>^ B (536) - D 2 AB {415). 
The bold curve depicts the gain (Eq. R) computed for the orig- 
inal radio data sets. The normal curve depicts the gain com- 
puted for the radio data with two skipped outlying points from 
B curve. 



Bootstrapping gives us an error of ±23 days for the 
shift, thus the correspondence between the results ob- 
tained from the optical and the radio observations is rather 
good. The error value must not be taken too seriously, 
since the use of a 7-point median filtering to build the 
mean curve estimates is somewhat arbitrary. However, it 
can be understood as a reasonable guess. 



4. Discussion 

Let us first take the original optical and radio data sets 
as they are. We were able to reproduce the results of the 
rather complex analysis in PRHa and PRHb, using ex- 
tremely simple statistics. However, the simple formulation 



Fig. 18. Linearly interpolated observations of the A and B 
curve. The upper curve is B, the lower curve is A. 
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Fig. 19. Dispersion spectra for the radio data sets with skipped 
two time points from B curve. The bold curve is D\ b (t), 
the flat normal curve is -Daii( r ) f° r radio data with skips, the 
steeper normal curve is the corresponding dispersion for arti- 
ficial data. 



of our procedures also allowed to get additional insights 
into what is actually going on. Wc now summarize the pro 
and contra arguments. 



4-1. Contra r = 536 days 

The feature around r = 536 days is located near the ab- 
solute minimum of the window function in the optical dis- 
persion spectra and does not have a profile which is ex- 
pected from the artificial curves. There is a large excess for 
the dispersion in the overlapped regions of the shifted light 
curves, and this is revealed using both methods of analysis. 
The last fact also invalidates the Monte-Carlo results in 
PRHa because their data set models are too regular com- 
pared to the real data sets. This is well illustrated in Fig. 
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Fig. 20. Optimally shifted radio observations with two skips 
in B curve. The bold curve depicts A data and the normal 
curve shifted B data. 

g. The dispersion spectra in the 536 days region are rather 
sensitive to skipping few data points and detrending. 

4-2. Pro t = 536 days 

Original data sets are used. The statistically sound opti- 
mal prediction method and simple explorative type analy- 
sis method give essentially the same results. No additional 
or excessively metaphysical arguments are involved in the 
data analysis. Good correspondence between optical and 
radio results. 

4-3. Contra r = 415 days 

The good looking results are obtained after somewhat ar- 
bitrary detrending and removing part of the data from 
analysis. From the purist's point of view this is unaccept- 
able. 



Fig. 21. Dispersion spectra for the extended data set provided 

?a b( t )i the lower curve 
is D\ b (t) for the corresponding artificial data set with time 
shift r 



to us by R. Schild. The upper curve is D k ^\t) 
") for the 
415 days. 



any skips is plotted against the spectrum of the artificial 
data set which was computed by 7-point median filtering 
of the coalesced original data (with B curve shifted by 
415 days). It can be seen that the feature around 536 
days is practically absent and the depression around 415 
days is well pronounced. This result is in good agree ment 
with the analysis mad e by S child and Thomson ( 1993 ) and 
Thomson and Schild fll993| ). 

The number of observations in the extended data set 
is now much larger and probably sufficient to address the 
rather complicated question about the statistical signifi- 
cance of the feature around 415 days. As we demonstrated 
in Section 2 the straightforward Monte-Carlo modelling 
can lead us to a wrong conclusion. Therefore we plan to 
use more complex waveform models for the data sets which 
will include also parts describing microlensing. 



4-4- Pro r = 415 days 

For both data sets the results of the explorative analysis 
are relatively consistent and stable. The dispersion for the 
overlapped region in the spectrum for the optical data is 
nearly the same as for other regions. The overall form of 
the depressions in the dispersion spectra mimic quite well 
model spectra. 

4-5. Future work 

After the first version of this paper was completed a more 
extensive optical data set became available to us. These 
data have been obtained by R. Schild and contain 707 time 
points from November 1979 to June 1993. Here we present 
only a very short analysis of the new data, details will be 
presented elsewhere. In Fig. £l] the dispersion spectrum 
for the full data set without any detrending and without 



5. Conclusions 

We introduced an extremely simple, but nevertheless sen- 
sitive method to seek probable time delay values for light 
curves which are assumed to originate from one and the 
same source. The method easily recovered results which 
were obtained by more complex and statistically sound 
procedures presented in PRHa and PRHb. However, the 
simplicity of the proposed method allowed us to get ad- 
ditional insights into the problems which originate from 
unfortunate spacing of the sampling points for the optical 
light curves. It allowed us also to produce an alternative 
consistent solution for the time delay problem. 

The dispersion spectra of the optical and radio data 
can show minima in various places depending on the 
method of analysis and preprocessing of the data. For the- 
oretical reasons delays r < can be excluded. The mini- 
mum near 1.5 years most probably results from a statis- 
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tical fluctuation which is amplified due to the windowing, 
since for this time shift the overlap of data from the two 
light curves is at a minimum and thus the least squares 
minimization has more freedom to fit free parameters. 

For the radio data we found that removing only two 
observations from B gives a stable spectrum for the dis- 
persions, with a best value for the shift near r = 409 days. 
This result is in good agreement with the value of r = 415 
days for the optical curve, which reveals itself after the 
detrending of the original data, or in other circumstances 
which allow to depress the instable minimum around 536 
days. 

We think that the time delay controversy on QSO 
0957+561 A,B is still not settled, but there is quite strong- 
evidence (especially if we take into account preliminary 
results obtained using the extended data set by R. Schild) 
that the time delay is in the region of 400-420 days. 
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A. Appendix. Nonparametric dispersion estimates 

A.l. Dispersion estimation through subsampling 

For a set of variables yi,i = 1,2, ... ,N we observe that 

N N 



2N 2 

V2 ££((»* -/*)-(» -*»))' 



i=\ .7=1 

N N 



2N 2 



l=\ 3 = 1 



1 N 1 N 

1=1 1=1 



(Al) 



If we set (j, = if J2i=i Vi t ne second term in the last ex- 
pression vanishes and consequently the first double sum 
can be considered as a standard (slightly biased) estima- 
tor for the sample dispersion. Taking into account that 
(y-i ~ Vj) 2 — {Vj ~ Vi) 2 arL d using another normalization 
we can obtain the standard unbiased estimate S 2 for the 
dispersion: 



N-l N 
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?— TT E E (w - %) 2 = 



i— 1 j—i-\-l 



N 



1 N 

»=i 



(A2) 



Let K — N(N — l)/2 be the total number of squares in 
the first sum, so that: 
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The sequence of half squares d^ = [yuk) — yj(k)) 2 /2 can 
be looked upon as a general finite sample from which we 
can draw random subsamples dyn, I = 1,2, ..., L. In this 
case the mean values 



s 2 = tX> 



k(i) 



(A4) 



i=i 



approximate the original estimate S 2 . How good are 
these approximations? From standard sampling theory 
(Cochran 1964[) we get the estimate for the dispersion of 
S 2 (subsampling dispersion): 



D(£* 2 ) = E(5' 2 - E5 2 



(If - L)D 2 
KL 



(A5) 



where E is mathematical expectation operator, D 2 is the 
dispersion for the full sample of half squares 



p2= Ek=i( d k-d) 2 
K-l 



(A6) 



and d is their mean value. We see that the approximation 
error vanishes if L approaches K. The important point is 
that even for small values of L the subsampling error is as 
small as D 2 /L. 

For the particular case of normally distributed y^s the 
half squares are distributed according to the scaled x 2 (l) 
law and consequently D 2 can be substituted by the value 
of 2a 2 where a is the standard deviation for the j/j or some 
of its estimates. 

Putting now all things together we get the main result 
of this paragraph: the dispersion S 2 can be estimated as 
half sum of squares chosen randomly from the full set. 
The approximate dispersion of this estimate can generally 
be computed from any available estimate of the original 
dispersion of yi , and in particular from S 2 itself: 



T>(S 2 



2{K-L)S 2 
KL 



(A7) 



Of course, for a simple estimation of the dispersion, our 
scheme is rather cumbersome, but as we will see later, 
it can be very useful in more complicated situations. In 
mathematical statistics the estima tes of type (A2) are 
known as U-statistics (Hoeffding, 1948| ) and estimates 
based on subsampling as incomplete U-statistics (Blom, 
i976|). 



J. Pelt et al.: Time delay controversy on QSO 0957+561 not yet decided 



11 



A. 2. Trend as nuisance 

Let us now consider the regression model 

Vi = g(U) + U. = 9i + £i,i = 1, 2, ...,N, 



(A8) 



where the U are randomly distributed time points, g(t) is 
some smooth but unknown function of time (trend), the yi 
are observed values and the e, are unknown observational 
errors. We are seeking an estimate for the dispersion of 
the €i. 

Normally the trend g(t) is approximated by some para- 
metric model and estimates for the e» and also for their 
dispersion are computed from residuals after some fitting 
procedure. But, as demonstrated below, the estimate for 
the dispersion can be computed without knowing the ex- 
act form of g{t). 

Let us first write down the general dispersion of the 
observed values 
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The last sum can be considered as small (at least for sam- 
ples containing enough observations) since trend and er- 
ror are not correlated. The value for the trend dispersion 
L>trend can, however, be significant. In order to get an esti- 
mate for the error dispersion S^ TI we must somehow elim- 
inate the trend dispersion from the general sum. From the 
discussion above we know that we are not forced to use all 
pairs of squares to get dispersion estimates. Let us choose 
only those pairs which have close enough observing times. 
Formally we can define a selection window 



v ' ' 3) lO ol 



t 3 \<6 
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(A10) 



for some specified maximum allowable lag 5, and enter it 
into the general sum 



S 2 (S) = - 
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This choice of a subsample from the total of squares is not 
random from the point of view of its construction, but it 
is random from the point of view of the statistics of errors. 
For a slowly varying trend g(t) we can assume that it 
is Lipschitz continuous with some maximum slope param- 
eter A 



\g{U)-g{tj)\ <A\U-tj\ 



(A12) 



and consequently the upper limit for the dispersion of the 
trend part can be estimated by 
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The last limit is quite conservative, so that it should be 
used only for rather crude guesses. Fortunately, the second 
sum can also be computed from the time point sequence. 
For every fixed value of S there are 
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pairs of squares to be taken into account when estimating 
the dispersion. The error from subsampling increases with 
decreasing L (and of course with decreasing 5). But the 
bias due to the trend part of the observed values (mod- 
elling error) decreases with decreasing S. Consequently 
there is a certain value for S where both errors are of com- 
parable magnitude. Let us call this tradeoff value optimal 
(see Fig. ||). The corresponding S 2 (S) can be used as an 
estimate for the original error dispersion. 

There is another well known procedure to choose a 
subsample of squared differences. We can select only pairs 
which consists of neighbouring observations in the original 
data. The corresponding estimate for dispersion is now 



5- 
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In this estimate we avoided introduction of any free 
parameters into the selection scheme. Similar estimates 
were proposed by different authors (see von Neumann 
et al. 1941 or in the context of periodicity search 



Lafler&Kinman 1965 ). 

It is also possible to combine two subsampling schemes 
given above. In this case only neighbouring pairs of obser- 
vations Hi,yj for which |t» — tj\ < 5 are included into the 
subsample. Thereby we avoid additional scatter in our es- 
timates due to the long gaps in the data set. However, ev- 
ery additional restriction for selected pairs decreases their 
total number and consequently increases the subsampling 
error. 
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